On Efficient Handling of Continuous Attributes in Large Data Bases

نویسنده

  • Hung Son Nguyen
چکیده

Some data mining techniques, like discretization of continuous attributes or decision tree induction, are based on searching for an optimal partition of data with respect to some optimization criteria. We investigate the problem of searching for optimal binary partition of continuous attribute domain in case of large data sets stored in relational data bases (RDB). The critical for time complexity of algorithms solving this problem is the number of I/O database operations necessary to construct such partitions. In our approach the basic operators are defined by queries on the number of objects characterized by means of real value intervals of continuous attributes. We assume the answer time for such queries does not depend on the interval length. The straightforward approach to the optimal partition selection (with respect to a given measure) requires basic queries, where is the number of preassumed partition parts in the searching space. We show properties of the basic optimization measures making possible to reduce the size of searching space. Moreover, we prove that using only simple queries, one can construct a partition very close to optimal.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Artificial Neural Networks and Support Vector Machines for carbonate pores size estimation from 3D seismic data

This paper proposes a method for the prediction of pore size values in hydrocarbon reservoirs using 3D seismic data. To this end, an actual carbonate oil field in the south-western part ofIranwas selected. Taking real geological conditions into account, different models of reservoir were constructed for a range of viable pore size values.  Seismic surveying was performed next on these models. F...

متن کامل

A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining

Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...

متن کامل

تجزیه و تحلیل مفهوم هندلینگ کاردرمانی در کودکان فلج مغزی: یک مطالعه هیبرید

Objective: This study aimed to analyze the concept of occupational therapy handling in the children with cerebral palsy from the perspective of occupational therapy instructors and clinicians in Iran. Materials & Methods: In this qualitative study, using hybrid model to clarify the concept of handling through three phases. For the theoretical phase, attributes of handling were recognized thr...

متن کامل

IntelligenceLEARNING OF INEXACT RULES BY THE FISH - NETALGORITHM FROM LOW QUALITY DATAHONGHUA

We present an algorithm, the FISH-NET algorithm, for deriving classiica-tion/forecasting rules from large data bases of low quality data. The attributes are assumed to be continuous, numeric variables. The algorithm works on the eld of the attributes, rather than on individual point values and is linear in both the number of attributes and the number of instances. The algorithm has been tested ...

متن کامل

A DEA-bases Approach for Multi-objective Design of Attribute Acceptance Sampling Plans

Acceptance sampling (AS), as one of the main fields of statistical quality control (SQC),involves a system of principles and methods to make decisions about accepting or rejecting alot or sample. For attributes, the design of a single AS plan generally requires determination ofsample size, and acceptance number. Numerous approaches have been developed foroptimally selection of design parameters...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Fundam. Inform.

دوره 48  شماره 

صفحات  -

تاریخ انتشار 2001